0.1 What is the purpose of SEAS?

Statistical Enrichment Analysis of Samples (SEAS) is a tool to find which clinical (metadata) attributes are enriched within a sample subset. For example, SEAS answer the following questions: * I have population data with brain cancer survival time; I select an interested patient subcohort, such as who received X treatment; does this subcohort have long survival time? SEAS can be used to infer or annotate the unknown clinical (metadata) attribute of a sample. For example: * I same a brain cancer patient whom I do not know the survival time; can I use SEAS to infer the survival time of the patient? To do so, I can define a subcohort, which includes the most similar patients to the unknown survival-time patient. The question is converted to the one above, which can be answered by SEAS. Also, in SEAS, I can use embedding to view similar patients.


0.2 What is SEAS session workflow?

As showed in Figure 1, a SEAS session https://aimed-lab.shinyapps.io/SEAS/ includes three steps:

The functional workflow is as follow:


0.3 What are the Input and output formats?

The input files for SEAS are in table text format. The table column can be separated by ‘tab’ or ‘comma’. ‘Tab’ is more preferred. Excel is recommended to prepare the input text file.


0.4 How to Navigate through SEAS?

0.4.1 Embedding

Embedding is a key element for SEAS to have good results. The user may choose either tSNE or umap algorithm to embed the sample if the user does not prepare the embedding input file. Still, we encourage the user to prepare and examine the embedding before analyzing using SEAS carefully.

0.4.2 Exploring data (optional)

SEAS allows users to visualize the clinical feature relations through grouped bar plots and scatter plots. upon uploading the dataset SEAS automatically identifies the data type of each clinotype in the dataset and places them in respective suitable plots. Linear Model Prediction is also added inside the scatter plot to visualize the correlation between two clinotypes.

0.4.3 Subcohort selection

SEAS support the following ways to select the subcohort: - Box selection: the user draw a bounding box that covers some samples in the embedding visualization. SEAS would recognize the samples inside the box as the subcohort. - Neighbor-point selection: in the embedding visualization, the user chooses a sample as the center and a radius. This defines a circle. SEAS would recognize all sample points inside the circle as the subcohort. - Entering sample selection: the user can enter the list of sample identifiers into a box to define a subcohort.

0.4.4 Understanding the result

SEAS presents the enrichment result in a table, typically as follow:

  • The first column is the feature name. The second feature is the value. For example, the figure about shows ‘Discrete_days_to_death > 300‘ (outcome: the patient survive for more than 300 days)
  • ‘# in the population’: the number of samples that have clinical outcomes defined by the previous two columns in the whole population.
  • ‘# in the selected cohort’: the number of samples that have the clinical outcome defined by the previous two columns in the selected subcohort
  • p-value: the result of statistical test for clinical enrichment. The smaller p-value is, the more likely the clinical outcome is prevalent in the selected subcohort.

0.5 What are the Current technical limitation?


0.6 How to contribute to SEAS?

We welcome the user’s feedback and contributed dataset for future SEAS development. Please email SEAS developer the issues and sample dataset at: - jakechen@uab.edu (Jake Chen, supervisor) - thamnguy@uab.edu (Thanh Nguyen, the architect) - sbharti@uab.edu (Samuel Bharti, the programmer).